Circumventing Data Quality Problems Using Multiple Join Paths
نویسندگان
چکیده
We propose the Multiple Join Path (MJP) framework for obtaining high quality information by linking fields across multiple databases, when the underlying databases have poor quality data, which are characterized by violations of integrity constraints like keys and functional dependencies within and across databases. MJP associates quality scores with candidate answers by first scoring individual data paths between a pair of field values taking into account data quality with respect to specified integrity constraints, and then agglomerating scores across multiple data paths that serve as corroborating evidences for a candidate answer. We address the problem of finding the top-few (highest quality) answers in the MJP framework using novel techniques, and demonstrate the utility of our techniques using real data and our Virtual Integration Prototype testbed.
منابع مشابه
The Bellman Data Quality Browser
Keynote Talk Abstract Data quality is a serious concern in complex industrial-scale databases, which often have thousands of tables and tens of thousands of columns. Commonly encountered problems include missing data (null values), duplicates and default values in columns supposed to treated as keys, data inconsistencies (violation of functional dependencies), and poor quality join paths (lack ...
متن کاملUsing MOLP based procedures to solve DEA problems
Data envelopment analysis (DEA) is a technique used to evaluate the relative efficiency of comparable decision making units (DMUs) with multiple input-output. It computes a scalar measure of efficiency and discriminates between efficient and inefficient DMUs. It can also provide reference units for inefficient DMUs without consideration of the decision makers’ (DMs) preferences. In this paper, ...
متن کاملJoin Constraints
Many application domains involve constraints that, at a conceptual modeling level, apply to one or more schema paths, each of which involves one or more conceptual joins (where the same conceptual object plays roles in two relationships). Popular information modeling approaches typically provide only weak support for such join constraints. This paper contrasts how join constraints are catered f...
متن کاملConstraints on Conceptual Join Paths
To ensure that a software system accurately reflects the business domain that it models, the system needs to enforce the business rules (constraints and derivation rules) that apply to that domain. From a conceptual modeling perspective, many application domains involve constraints over one or more conceptual schema paths that include one or more conceptual joins (where the same conceptual obje...
متن کاملMulticast State Distribution by Joins Using Multiple Shortest Paths
The lack of resources in routers will become a crucial issue with the deployment of state storing protocols. In particular, single or any source multicast protocols will most probably take over large amounts of resources for maintaining multicast tree information. The aim of this paper is to study the possibility and benefit of using multiple shortest paths in order for a new member to reach a ...
متن کامل